Project Title

INFO 526 - Summer 2024 - Final Project

Project description
Author
Affiliation

Team name

School of Information, University of Arizona

Abstract

Add project abstract here.

Introduction:

This project utilizes the data set from the week of January 23, 2024 on the TidyTuesdayR website. This data set  focuses on the educational attainment of young people in English towns. Provided by the UK Office for National Statistics, this data set explores why children and young people in smaller towns often perform better academically than those in larger towns. The data includes a variety of metrics such as population, regional classifications, coastal status, job density, income levels, university presence, and detailed educational outcomes at various stages, from Key Stage 2 through age 22.

I chose to use this data set because, upon reviewing it, I immediately envisioned the variety of compelling visuals that could be created from the rich and detailed information it contains. The diverse metrics, ranging from educational outcomes to economic factors, present numerous opportunities for insightful data visualizations.

For my analysis with this data set, I will examine the relationship between university presence and regional differences on the number of 19-year-olds in apprenticeships. This question interests me because it combines educational infrastructure with geographic diversity, potentially revealing how the presence of universities and the specific characteristics of different regions influence the career paths young people choose. My initial thought would be that the absent of a university would push more 19-year-olds into apprenticeships due to the pursing university endeavors is an available option.  

Approach:

I begin by wanting to create a simple graph to view the data. Therefore, I start with a simple geom_point plot.

The geom_point plot gives me an idea of the data, however, there is to much going on in this visualization. I also suspect that there is a lot of hidden data due to overlap. The next approach will give us an idea of the distribution of the data. This is achieved by creating a box pot visualization of the data using geom_boxplot.

This give a better representation of the data set. We can see that the median values of 19 year olds in apprenticeships is greater in towns without universities present compared to those town with universities. There is still a lot of data clustered together with this visualization so I use facet_grid to create columns by region and row by the university flag. This visualization makes easier to compare the data between all the interested variables.

Discussion:

Interesting discussion points that I found from visulizing the data set. The towns with the highest percentage of 19 year olds in apprenticeships is town without university in the North West region. However these town are outliers, the region with the greatest median percentage of 19 year olds in apprenticeships is the North East. Another interesting observation that I noticed was that the spread of the distributions is greater when looking at towns without university and is more consolidated in towns with universities present.